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Abstract. We study the computational complexity of Nash equilibria in con- 
Q-i current games with limit-average objectives. In particular, we prove that the 

^ existence of a Nash equilibrium in randomised strategies is undecidable, while 

the existence of a Nash equilibrium in pure strategies is decidable, even if we put 
fS] a constraint on the payoff of the equilibrium. Our undecidability result holds even 

for a restricted class of concurrent games, where nonzero rewards occur only on 
^ terminal states. Moreover, we show that the constrained existence problem is un- 

decidable not only for concurrent games but for turn-based games with the same 
K/i restriction on rewards. Finally, we prove that the constrained existence problem 

' for Nash equilibria in (pure or randomised) stationary strategies is decidable and 

^ analyse its complexity. 

> 

^ 1 Introduction 

(N 

Concurrent games provide a versatile model for the interaction of several com- 
ponents in a distributed system where the components perform actions in 
2 parallel [17]. Classically, such a system is modelled by a family of concurrent 

^ two-player games, one for each component, where one component tries to fulfil 

^ its specification against the coalition of all other components. In practice, this 

modelling is often too pessimistic because it ignores the specifications of the 
other components. We argue that a distributed system is more faithfully mod- 
^ elled by a multiplayer game where each player has her own objective, which is 

independent of the other players' objectives. 

Another objection to the classical theory of verification and synthesis has 
been that specifications are qualitative: either the specification is fulfilled, or 
it is violated. Examples of such specifications include reachability properties, 
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where a certain set of target states has to be reached, or safety properties, where 
a certain set of states has to be avoided. In practice, many specifications are 
of a quantitative nature, examples of which include minimising average power 
consumption or maximising average throughput. Specifications of the latter 
kind can be expressed by assigning (positive or negative) rewards to states or 
transitions and considering the limit-average reward gained from an infinite play. 
In fact, concurrent games where a player's payoff is defined in such a way have 
been a central topic in game theory (see the related work section below). 

The most common solution concept for games with multiple players is that 
of a Nash equilibrium [20]. In a Nash equilibriimi, no player can improve her 
payoff by changing her strategy unilaterally. Unfortiinately, Nash equilibria do 
not always exist in concurrent games, and if they exist, they may not be unique. 
In applications, one might look for an eqiiilibrium where some players receive a 
high payoff while other players receive a low payoff. Formulated as a decision 
problem, given a game with k players and thresholds x,y e (Q U {±00})*^, 
we want to know whether the game has a Nash equilibrium whose payoff lies 
in-between x and y; we call this decision problem NE. 

The problem NE comes in several variants, depending on the type of strategies 
one considers: On the one hand, strategies may be randomised (allowing random- 
isation over actions) or pure (not allowing such randomisation). On the other 
hand, one can restrict to strategies that use finite memory or even to stationary 
strategies, which only depend on the last state. Indeed, we show that these 
restrictions give rise to distinct decision problems, which have to be analysed 
separately. 

Our results show that the complexity of NE highly depends on the type of 
strategies that realise the equilibrium. In particular, we prove the following 
resiilts, which yield an almost complete picture of the complexity of NE: 

1. NE for pure stationary strategies (or pure strategies with bounded memory) 
is NP-complete. 

2. NE for stationary strategies (or randomised strategies with bounded memory) 
is decidable in Pspace, but hard for both NP and SqrtSum. 

3. NE for arbitrary pure strategies is NP-complete. 

4. NE for arbitrary randomised strategies is undecidable. 

AU of our lower boimds for NE and, in particular, oui undecidability result 
hold already for a subclass of concurrent games where Nash equilibria are 
guaranteed to exist, namely for turn-based games. If this assumption is relaxed 
and Nash equilibria are not guaranteed to exist, we prove that even the plain 
existence problem for Nash equilibria is imdecidable. Moreover, many of our 
lower boimds hold already for games where non-zero rewards only occur on 
terminal states, and thus also for games where each player wants to maximise 
the total sum of the rewards. 
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As a byproduct of our decidability proof for pure strategies, we give a 
pol5momial-time algorithm for deciding whether in a multi-weighted graph 

there exists a path whose limit-average weight vector lies between two given 
thresholds, a result that is of independent interest. For instance, our algorithm 
can be used for deciding the emptiness of a multi-threshold mean-payoff language [2] 
in polynomial time. 

RELATED WORK. Concurrent and, more generally, stochastic games go back 
to Shapley [24], who proved the existence of the value for discounted two-player 
zero-sum games. This result was later generalised by Fink [13] who proved 
that every discoimted game has a Nash equilibrium. Gillette [16] introduced 
limit-average objectives, and Mertens & Neyman [19] proved the existence of the 
value for stochastic two-player zero-sum games with limit-average objectives. 
Unfortunately, as demonstrated by Everett [12], these games do, in general, not 
admit a Nash equilibrium (see Example 1). However, Vielle [29, 30] proved 
that, for aU £ > 0, every two-player stochastic limit-average game admits an 
£-equilibrium, i.e. a pair of strategies where each player can gain at most e from 
switching her strategy. Whether such equilibria always exist in games with more 
than two players is an important open question [21]. 

Determining the complexity of Nash equilibria has attracted much interest 
in recent years. In particular, a series of papers culminated in the result that 
computing a Nash equilibrium of a finite two-player game in strategic form is 
complete for the complexity class PPAD [6, 8]. The constrained existence problem, 
where one looks for a Nash equilibrium with certain properties, has also been 
investigated for games in strategic form. In particular, Conitzer & Sandholm [7] 
showed that deciding whether there exists a Nash equilibrium whose payoff 
exceeds a given threshold and related decision problems are NP-complete for 
two-player games in strategic form. 

For concurrent games with limit-average objectives, most algorithmic results 
concern two-player zero-sum games. In the turn-based case, these games are 
commonly known as mean-payoff games [10, 32]. While it is known that the value 
of such a game can be computed in pseudo-polynomial time, it is still open 
whether there exists a polynomial-time algorithm for solving mean-payoff games. 
A related model are multi-dimensional mean-payoff games where one player tries 
to maximise several mean-payoff conditions at the same time [5]. In particular, 
Velner & Rabinovich [28] showed that the value problem for these games is 
coNP-complete. 

One subclass of concurrent games with limit-average objectives that has 
been studied in the multiplayer setting are concurrent games with reachability 
objectives. In particular, Bouyer et al. [3] showed that the constrained existence 
problem for Nash equilibria is NP-complete for these games (see also [26, 14]). 
We extend their result to limit-average objectives. However, we assume that 
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strategies can observe actions (a common assumption in game theory), which 
they do not. Hence, while our result is more general w.r.t. the type of objectives 
we consider, their result is more general w.r.t. the t5^e of strategies they allow. 

In a recent paper [27], we studied the complexity of Nash equilibria in 
stochastic games with reachability objectives. In particular, we proved that NE 
for pure strategies is undeddable in this setting. Since we prove here that this 
problem is decidable in the non-stochastic setting, this undecidability result can 
be explained by the presence of probabilistic transitions in stochastic games. 
On the other hand, we prove in this paper that randomisation in strategies also 
leads to undecidability, a question that was left open in [27]. 



2 Concurrent Games 

Concurrent games are played by finitely many players on a finite state space. 
Formally, a concurrent game is given by 

- a finite nonempty set 17 of players, e.g. IT = {0, 1, . . . , fc — 1}, 

- a finite nonempty set S of states, 

- for each player i and each state s a nonempty set J](s) of actions taken from a 
finite set F, 

- a transition function S: S x S, 

- for each player f £ 17 a reward function r,- : S ^ IR. 

For computational purposes, we assimie that all rewards are rational nimibers 
with numerator and denominator given in binary. We say that an action profile 
a — {ai)i^Yi is legal at state s if a, G Fi{s) for each i e 77. Finally, we call a 
state s controlled by player / if |J^(s)| = 1 for all ^ i, and we say that a game 
is turn-based if each state is controlled by (at least) one player. For turn-based 
games, an action of the controlling player prescribes to go to a certain state. 
Hence, we will usually omit actions in turn-based games. 

For a tuple x — (xi)i^Yi> where the elements Xi belong to an arbitrary set X, 
and an element x e X, we denote by the restriction of S to J7 \ {i} and by 
x) the unique tuple y e X^ with j/,- = x and = x_;. 

A play of a game Q is an infinite sequence SofloSi«i • • • G (S ■ F^)^ such that 
S{sj,aj) = Sy_|_i for all G N. For each player, a play n — sofloSi«i • • • gives rise 
to an infinite sequence of rewards. There are different criteria to evaluate this 
sequence and map it to a payoff. In this paper, we consider the limit-average 
(or mean-payoff) criterion, where the payoff of n for player i is defined by 



4 



Note that this payoff mapping is prefix-independent, i.e. (pi{n) = (pi{n') if n' is a 
suffix of n. An important special case are games where non-zero rewards occur 
only on terminal states, i.e. states s with S{s,a) = s for aU (legal) a G F^. These 
games were introduced by Everett [12] under the name recursive games, but we 
prefer to call them terminal-reward games. Hence, in a terminal-reward game, 
^i{n) = ri{s) if TT enters a terminal state s and (pi{n) ~ otherwise. 

Often, it is convenient to designate an initial state. An initialised game is thus 
a tuple So) where ^ is a concurrent game and sq is one of its states. 

STRATEGIES AND STRATEGY PROFILES. For a finite Set X, we denote by ^'(X) the 
set of probability distributions over X. A (randomised) strategy for player i 
in ^ is a mapping cr: (S • F^)* ■ S — > 'D{F) assigning to each possible his- 
tory xs e {S ■ F^)* ■ S a probability distribution cr{xs) over actions such that 
£r(xs)(a) > only if a e Fi{s). We write aia \ xs) for the probability assigned 
to fl e r by the distribution i7{xs). A (randomised) strategy profile of Q is a tuple 
a — {o'i)i(zn of strategies in G, one for each player. Note that a strategy profile 
can be identified with a function a: (S • F^)* ■ S -5- V{F)^. 

A strategy a for player i is called pure if for each xs e (S ■ F^)* ■ S the 
distribution a{xs) is degenerate, i.e. there exists a e Fi{s) with a{a \ xs) — 1. Note 
that a pure strategy can be identified with a function a: (S • F^)* ■ S ^ F. A 
strategy profile cr — {ai)i^u is called pure if each cr, is pure, in which case we can 
identify a with a mapping (S • F^)* ■ S — > F^. Note that, given an initial state sq 
and a pure strategy profile cr, there exists a imique play tt = sofloSiSi . . . such 
that (7(soflo ■ ■ ■ ^;-iSy) = fl; for all e N; we call tt the play induced by a from sq. 

A memory structure for ^ is a triple 971 = (M, S,mQ), where M is a set of memory 
states, S: Mx S X F^ — >• M is the update function, and mg & Mis the initial memory. 
A (randomised) strategy with memory 9Jt for player i is a function a: Mx S — > 
T>{F) such that a{m,s)(a) > only if a G Fi{s). The strategy a is pure if the 
distribution a{m,s) is degenerate for all m e M and s G S. A (pure) strategy cr 
with memory $H can be viewed as a (pure) strategy a' in the usual sense by 
setting a'{xs) = a{5*(x),s), where 5*{x) is defined inductively by ^*(£) = wiq 
and 5*(x ■ sa) — S{S*{x),s,a). A finite-state strategy is a strategy cr with finite 
memory SOt. If the memory S[t is a singleton, we call a stationary. Moreover, we 
call a strategy positional if it is both pure and stationary. A stationary strategy 
can thus be represented by a mapping cr: S ^ ^(^)/ and a positional strategy 
by a mapping cr: S — > T. Finally, we call a strategy profile finite-state, stationary 
or positional if each strategy in the profUe has the respective property. 

THE PROBABILITY MEASURE INDUCED BY A STRATEGY PROFILE. Given an initial 

state So e S and a strategy profile a = {o'i)i^n' the conditional probability of 
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given the history xs G (S • F^)* ■ S equals 
^{a I xs) := Y\ '^ii'^i I xs). 

The probabilities ^{a \ xs) induce a probability measure on the Borel u-algebra 
over (S • r^)'^ as follows: The probability of a basic open set siUi . . . Sna„ ■ 
(S • r^)'^ equals the product YVj=i ^{^j I sifli • • • «;-iSy) if si — sq and S{sj, Uj) — 
sy_|_i for aU 1 < < n; in all other cases, this probability is 0. By Caratheodory's 
extension theorem, this extends to a unique probability measure assigning a 
probability to every Borel subset of (S • F^)'^ , which we denote by Pr^^. Via 
the natural projection (S ■ F^)^ — )• S", we obtain a probability measure on the 
Borel (T-algebra over S'^. We abuse notation and denote this measvire also by Prg^; 
it should always be clear from the context to which measure we are referring to. 
Finally, we denote by E^^^ the expectation operator that corresponds to Prf^^, i.e. 
Efo(/) = //dPrfo for all Borel measurable functions/: (S ■ F^f ^ RU {±00} 
or /: S'^ — )• R U {±00}. In particular, we are interested in the quantities p, :— 
^soi^i)- ^^^^ Pi ^^'^ (expected) payoff of a for player i and the vector {pi)i^n 
the (expected) payoff of ir. Finally, we call a history x G (S • F^) • S consistent 
with a if Prfp (x ■ (S ■ f")'^) > 0. 

In order to apply known results about Markov chains, we can also view 
the stochastic process induced by a strategy profile as a countable Markov 
chain G^, defined as follows: The set of states of equals the set (S • F^)* ■ S 
of histories of Q. The only transitions from a state xs lead to states of the form 
xsat where t = 5{s,a), and such a transition occurs with probability (7{a \ xs). 

For each player i, the Markov decision process Q^-' has the same states as G^, 
and there is a transition from a state xs to a state xsat with action a G r;(s) and 
probability p if — a, S{s, a) — t and p — Tlj^i Cj{^j)- Finally, the reward of a 
state xs in Q°'-^ equals the reward r/(s) of the state s for player i in Q. 

If (7 is a strategy profile with finite memory 971, we make Q'^ and Q'^-' finite by 
quotienting the state space w.r.t. the equivalence relation ~, defined by xs ~ yt 
its — t and 6* {x) = 3* (y). In particular, if a is stationary, then the state spaces 
of and Q^-^ coincide with the state space of Q. 

DRAWING CONCURRENT GAMES. When drawing a concurrent game as a graph, 
we will adhere to the following conventions: States are usually depicted as 
circles, but terminal states are depicted as squares. The irutial state is marked by 
a dangling incoming edge. An edge from s to t with label a means that S{s,a) — t 
and that a is legal at s. However, the label a might be omitted if it is not essential. 
In turn-based games, the player who controls a state is indicated by the label 
next to it. Finally, a label of the form /: x next to state s indicates that r, (s) — x; 
if this reward is 0, the label will usually be omitted. 
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Figure 1. A terminal-reward game that 
has no Nash equilibriimi. 



2: 1 




1: 1 



Figure 2. A limit-average game that has 
no Nash equUibrium. 



3 Nash Equilibria 

To capture rational behaviour of selfish players, Nash [20] introduced the notion 
of — what is now called — a Nash equilibrium. Formally, given a game Q and an 
initial state sq, a strategy t for player i is a best response to a strategy profile a if 
T maximises the expected payoff for player i, i.e. 

for all strategies t' for player i. A strategy profile cr = {(Tj)i^Yj is a Nash equilibrium 
of {Q,sq) if for each player i the strategy (7, is a best response to a. Hence, in a 
Nash equilibrium no player can improve her payoff by (imilaterally) switching 
to a different strategy. As the following examples demonstiate, Nash equilibria 
are not guaranteed to exist in concurrent games. 

Example 1. Consider the terminal-reward game Qi depicted in Figure 1 and 
played by players 1 and 2, which was originally presented in [9]. We claim that 
{Gi,S\) does not have a Nash equilibrium. First note that, for each e > 0, player 1 
can ensure a payoff of 1 — 2£ by the stationary strategy that selects action b with 
probability e. Hence, every Nash equilibrium {(r,r) of {Q\,s\) must have payoff 
(1,-1). Now we distinguish whether (T{b \ {si{a,a))^si) = for all A: e N or 
not. In the first case, there must exist e N such that T{b \ {si{a,a))^si) > 
(otherwise (cr, t) would not have payoff (1, —1)). But then Player 2 can improve 
her payoff by always playing action a with probability 1, a contradiction to 
{cr, t) being a Nash equilibriiam. In the second case, consider the least k such 
that p := cr{b \ {si{a,a))^si) > 0. By choosing action b with probability 1 for 
the history {si{a, a))^Si and choosing action a with probability 1 for all other 
histories, player 2 can ensure payoff p, again a contradiction to (tr, t) being a 
Nash equilibrium. 

Example 2. A variation of the previous game is the game Q2, which is depicted 
in Figure 2 and also played by players 1 and 2. It is not a terminal-reward game, 
but the only rewards that occur in the game are and 1. Using almost the 
same argumentation as in Example 1, we can show that (^2/Si) has no Nash 
equilibrium either. 
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1: 1 



2: 1 




1: 1 



0: 1 



2: 1 



Figure 3. A game with no pure Nash equilib- 
rixim where player wins with positive prob- 
ability. 



Figure 4. A game with no station- 
ary Nash equilibrium where player 
wins with positive probability 



It follows from Nash's theorem [20] that every game whose arena is a tree 
(or a DAG) has a Nash equilibrium. Another important special case of concurrent 
limit-average games where Nash equilibria always exist are tuurn-based games. 
For these games, Thuijsman & Raghavan [25] proved not only the existence of 
arbitrary Nash equilibria but of pure finite-state ones. 

To measure the complexity of Nash equilibria in concurrent games, we intro- 
duce the following decision problem, which we call NE: 

Given a game Q, a state sq and thresholds x,y e (Q U {±oo})^, decide 
whether {G,so) has a Nash equilibrium with payoff > x and < y. 

Note that we have not put any restriction on the type of strategies that realise the 
equilibritim. It is natural to restrict the search space to profiles of pure, stationary 
or positional strategies. These restrictions give rise to different decision problems, 
which we call PureNE, StafNE and PosNE, respectively. 

Before we analyse the complexity of these problems, let us convince ourselves 
that these problems are not just different faces of the same coin. We first show 
that the decision problems where we look for equilibria in randomised strategies 
are distinct from the ones where we look for equilibria in pure strategies. 

Proposition 3. There exists a turn-based terminal-reward game that has a sta- 
tionary Nash equilibritrai where player receives payoff 1 but that has no piire 
Nash eqiiUibrium where player receives payoff > 0. 

Proof. Consider the game depicted in Figtire 3 and played by three players 0, 1 
and 2. Clearly, the stationary strategy profile where at state S2 player selects 

both outgoing transitions with probability j each, player 1 plays from so to si and 
player 2 plays from to S2 is a Nash equilibrium where player receives payoff 1. 
However, in any pure strategy profile where player receives payoff > 0, either 
player 1 or player 2 receives payoff and could improve her payoff by switching 
her strategy at sq or sj, respectively. □ 

Now we show that it makes a difference whether we look for an equilibrium 
in stationary strategies or not. 
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Proposition 4. There exists a turn-based terminal-reward game that has a pure 
Nash equilibriiam where player receives payoff 1 but that has no stationary 
Nash eqiiilibrium where player receives payoff > 0. 

Proof. Consider the game Q depicted in Figure 4 and played by three players 0, 

1 and 2. Clearly, the pure strategy profile that leads to the terminal state with 
payoff 1 for player and where player plays "right" if player 1 has deviated 
and "left" if player 2 has deviated is a Nash equilibrium of {Q,Sq) with payoff 1 
for player 0. Now consider any stationary equilibrium of {Q,sq) where player 
receives payoff > 0. If the stationary strategy of player prescribes to play "right" 
with positive probability, then player 2 can improve her payoff by playing to S2 
with probability 1, and otherwise player 1 can improve her payoff by playing 
to S2 with probability 1, a contradiction. □ 

It follows from Section 3 that NE and StatNE are different from PureNE and 
PosNE, and it follows from Proposition 4 that NE and PureNE are different from 
StatNE and PosNE. Hence, aU of these decision problems are pairwise distinct, 
and their decidability and complexity has to be studied separately. 

4 Positional Strategies 

In this section, we show that the problem PosNE is NP-complete; we start by 
proving the upper boimd. 

Theorem 5. PosNE is in NP. 

Proof. To decide PosNE on input Q, sq, x, y, we start by guessing a positional 
strategy profile a of Q, i.e. mappings u, : S — >• T such that cr/(s) e J^(s) for all 
/ G 17 and s G S. Then, we verify whether ir is a Nash equilibrium with the 
desired payoff. To do this, we first compute the payoff z, of a for each player i 
by computing the number Efg((^;) in the finite Markov chain Q^. Since is 
deterministic, this number equals the average weight (for player i) on the unique 
simple cycle reachable from sq and can thus be computed in polynomial time. 
Once each z, is computed, we can easily check whether x, < z, < y,. To verify 
that ir is a Nash equilibriimi, we additionally compute, for each player /, the 
value Vi of the finite MDP Q^-' from sq. This number can be computed by 
identifying the highest average weight (for player i) on a simple cycle reachable 
in Q"'-' from sq, which can also be done in pol5momial time [18]. Clearly, a" is a 
Nash equilibrium if and only if Vi < z, for each player i. □ 

A result by Chatterjee et al. [5, Lemma 15] implies that PosNE is NP-hard, 
even for turn-based games with rewards taken from { — 1,0,1} (but with an 
unbounded number of players). We strengthen their result by showing that the 
problem remains NP-hard if there are only three players and rewards are taken 
from {0,1}. 
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Theorem 6. PosNE is NP-hard, even for turn-based three-player games with 
rewards and 1. 

Proof. We reduce from the Hamiltorvian cycle problem. Given a graph G = {V,E), 
we define a turn-based three-player game G as follows: the set of states is V, 
all states are controlled by player 0, and the transition function corresponds 
to E (i.e. ro{v) = vE and 5{v,a) = w if and only if uq = w). Let n = |y| and 
vq e V. Player receives reward 1 in each state. The reward of state vq to 
player 1 equals 1; all other states have reward for player 1. Finally, player 2 
receives reward at vq and reward 1 at all other states. We show that there is a 
Hamiltonian cycle in G if and only if {Q,Vo) has a positional Nash equilibrium 
with payoff > {1,1/ n, (n — l)/n). 

Let 7T = 7T(0)7r(l) . . . n{n) be a Hamiltonian cycle that starts (and ends) 
in 7r(0) — Vq = n{n). Consider the positional strategy cr of player that plays 
from n{i) to n{i + 1) for all i < n. The induced play from vq is the play 
(7r(0)7r(l) . . . Ti{n — 1))^, which gives payoff 1 to player 0, payoff 1/n to player 1 
and payoff (n — l)/n to player 2. Moreover, it is obvious that we have a Nash 
equilibrium. 

(^) Let n be the play induced by a positional Nash equilibriimi of {G, vq) 
with payoff > (1, 1/n, (n — l)/n). Since n corresponds to a positional strategy 
profile and gives player 1 a positive payoff, n has the form n = {vqVi . . . Vi_i)", 
where 1 <i <n and vq ■ ■ ■ ^'/-i^'o is a simple cycle of G. Hence, the payoff of tt 
for player 2 equals (/ — l)/i. This number is greater than (n — l)/n only if / > n. 
Hence, i — n and vq. . . fj-ii'o is a Hamiltonian cycle. □ 

By combining our reduction with a game that has no positional Nash equilib- 
rium, we can prove the following stronger result for non- turn-based games. 

Corollary 7. Deciding the existence of a positional Nash equilibrium in a con- 
current limit-average game is NP-complete, even for three-player games with 
rewards and 1. 

Proof. Membership in NP follows from Theorem 5. To prove hardness, we reduce 
from the following problem, whose NP-hardness follows from the proof of 
Theorem 6: Given a three-player game {Q,sq) with rewards and 1 and n G N 
(given in unary), decide whether {G,So) has a positional Nash equilibrium with 
payoff > (1,1/n, (n — l)/w). From G, we construct a new game G', which 
employs the game G2 from Example 2 and is depicted in Figure 5; we set the 
reward for player in all states of Gi to 1. Note that we can simulate the 
fractional rewards in the terminal state by a cycle of n states with rewards 
and 1. We claim that {G',Sq) has a positional Nash equilibrium if and only if 
{G,So) has a positional Nash equilibrium with payoff > {1,1/n, {n — l)/n). 

Let a he a positional Nash equilibrium of {G',s'q). Since (t/2,si) does 
not have a Nash equilibrium, the induced play must either enter the game G 
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0: 
1: 1/n 

2: {n-l)/n 



,a) [a, a, a) {a,a,h) 

i ^ [a, h, h) 



Figure 5. The game Q' . 



or end at the terminal state with payoff for player 0. But the latter case is 
impossible since then player coidd improve her payoff by playing action h at Sq. 
Hence, the induced play enters Q, and a is also a Nash equilibriimi of {Q,sq). 
Moreover, a must have payoff at least (1, 1/n, (n — l)/n) since otherwise player 1 
or player 2 could improve her payoff by playing action h at s'^ . 

(■^=) Let t7 be a positional Nash equilibriimi of {G,So) with payoff at least 
{1, 1/n, {n — l)/w). We can extend a" to a positional Nash equilibrium of {Q',s'q) 
by setting ^'(sq) — a'{sQ{a,a,a)s[) — (a, a, a). □ 



5 Stationary Strategies 

To prove the decidability of StatNE, we appeal to results established for the 
existential theory of the reals, the set of all existential first-order sentences (over the 
appropriate signatixre) that hold in the ordered field 91 := (R, +, -,0, 1, <). The 
best known upper bound for the complexity of the associated decision problem 
is PsPACE [4], which leads to the following theorem. 

Theorem 8. StatNE is in Pspace. 

Proof. To prove membership in Pspace, we show that there is a polynomial-time 
procedure that on input Q, sq, x, y returns an existential first-order sentence tp 
such that {Q, sq) has a stationary Nash equilibrium with payoff > x and < y if 
and only if holds in d\. How does look like? Let a = («s,fl)ien,ses,<!er/ ^ — 
(^s);eiT,seS' 5 — (fcs)seS z — (Zs)i€n,seS be foiir sets of variables. The formvda 

A ( E = 1 /\ A <a > A A <,a = O) 

ses aer aeri(s) aer\ri{s) 

states that the mapping cr, : S R^, defined by (J,(s): a i-^ al^ is indeed a 
stationary strategy for player i. Provided that each <p, (a) holds in ^R, the formiila 

rji{K,z) :^3b(^/\bs+zi^ ri(s) + b^^.-^) ■ H A 
seS aer" jen 

A 4 = E 4{s,a) ■ n 

ses flsrn jen 
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states that zl — E^(<^;) for all s G S, where cr — (cfO/en (see [22, Theorem 8.2.6]). 
Finally, the formula 

seSaer aer'^ i¥=i 

aj—a 

ses aer aer" jy^i 

states that v is a solution of the linear programme for computing the values of 
the MDP (see [22, Section 9.3]), i.e. the formula is fulfilled if and only if 
v[ > sup^E^""^(^;) for all z e n and s G S. 

The desired sentence \p is the existential closure of the conjimction of the 
formulae ?/,• and i?,- combined with formulae stating that player i cannot 
improve her payoff and that the expected payoff for player i lies in-between the 
given thresholds: 

xp 3a 3v 3z /\ (<p,(a) A j/,-(a,z) A ^i{oL,v) A < A < < y,) . 
ien 

Clearly, ip can be constructed in polynomial time from Q, sq, x and y. Moreover, 
:p holds in $H if and only if {Q, sq) has a stationary Nash eqidUbrium with payoff 
at least x and at most y. □ 

The next theorem shows that StatNE is NP-hard, even for turn-based games 
with rewards and 1. Note that this does not follow from the NP-hardness of 
PosNE, but requires a different proof. 

Theorem 9. StatNE is NP-hard, even for turn-based games with rewards and 1. 

Proof. We employ a reduction from SAT, which resembles a reduction in [26]. 
Given a Boolean formula = Cj A • • • A C,„ in conjxmctive normal form over 
propositional variables Xi, . . . , X„, where w.l.o.g. m > 1 and each clause is 
nonempty, we build a turn-based game G played by players 0, 1, . . . , n as follows: 
The game Q has states Ci,...,Cm controlled by player and for each clause C 
and each literal L that occiirs in C a state (C, L), controlled by player z if L = X; or 
L = ^X,; additionally, the game contains a terminal state _L. There are transitions 
from a clause Cy to each state {Cj, L) such that L occurs in Cj and from there 
to C(yjnodm)+i' there is a transition from each state of the form (C, -^X) 
to _L. Each state except _L has reward 1 for player 0, whereas _L has reward for 
player 0. For player i, all states except states of the form (C, X,) have reward 1; 
states of the form (C,X;) have reward 0. The structure of Q is depicted in 
Figure 6. 

Clearly, Q can be constructed from cp in polynomial time. In order to establish 
our reduction, we prove that the following statements are eqmvalent: 
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Figure 6. Reducing SAT to StatNE. 



1. f is satisfiable. 

2. {Q,Ci) has a positional Nash equilibrium with payoff > 1 for player 0. 

3. {Q, Ci) has a stationary Nash equilibrium with payoff > 1 for player 0. 

(1. 2.) Assume that a : {Xi, . . . ,X„} — > {true, false} is a satisfying assign- 
ment for cp. We show that the positional strategy profile a where at any time 
player plays from a clause C to a fixed state (C,L) such that L is mapped to 
true by a and each player i ^ never plays to ± is a Nash equilibritim of {Q, Ci) 
with payoff 1 for player 0. First note that the induced play never reaches _L. 
Hence, player receives payoff 1, which is the best payoff player can get. 

To show that a is a Nash eqmlibrium, consider any player i who receives 
payoff < 1. Hence, a state of the form (C, X,) is visited in the induced play. 
However, as player plays according to the satisfying assignment, no state of 
the form (C', ^X,) is ever visited. Hence, player / cannot improve her payoff by 
playing to ±. 

(2.^3.) Trivial. 

(3. 1.) Assume that {Q,Ci) has a stationary Nash equilibritim a with payoff 

> 1 for player 0. Hence, the terminal state _L is reached with probability in cr. 
Consider the variable assignment a that maps X, to true if and only if player / 
receives payoff < 1 from a; we claim that a satisfies the formula. Consider 
any clause C. By the construction of Q, there exists a literal L G C such that 
cro((C,L) I C) > 0. If L = Xi, then 'E^^icpi) < 1 and a maps X; to true, thus 
satisfying C. If L = ^X,, then player i must receive payoff 1 since otherwise 
she could switch to the positional strategy T that plays from (C, L) to _L; in the 
strategy profile ((7_;, t) the state ± is visited with probability 1, which gives 
payoff 1 to player i. Hence, oc maps X,- to false and satisfies C. □ 
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By combining oixr reduction with the game from Example 1, we can prove 
the following stronger result for conciirrent games. 

Corollary 10. Deciding the existence of a stationary Nash equilibriiim in a 
concurrent limit-average game with rewards and 1 is NP-hard. 

Proof. The proof is similar to the proof of Corollary 7. From a given concurrent 
limit-average game {G,so) with rewards and 1, we construct a new game 
{Q',Sq) such that {Q',Sq) has a stationary Nash equilibrium if and only if {Q,so) 
has a stationary Nash equilibrium with payoff at least 1 for player 0. The game Q' 
is the disjoint union of Q, the game Q2 from Example 2, and the state Sq, which is 
controlled by player 0. At Sq player can either play to the initial state sq of Q or 
to the initial state si of ^2- Finally, we set the reward for player in every state 
of 02 to 1. □ 

So far we have shown that StatNE is contained in Pspace and hard for NP, 
leaving a considerable gap between the two boimds. In order to gain a better 

understanding of StatNE, we relate this problem to the square root sum problem 
(SqrtSum), an important problem about numerical computations. Formally, 
SqrtSum is the following decision problem: Given numbers di, . . . , d„, 
decide whether X]"=i V^i > k. Recently, Allender et al. [1] showed that SqrtSum 
belongs to the foixrth level of the counting hierarchy, a slight improvement over the 
previously known Pspace upper bound. However, it has been an open question 
since the 1970s as to whether SqrtSum falls into the polynomial hierarchy [15, 11]. 
We give a polynomial-time reduction from SqrtSum to StatNE for turn-based 
terminal-reward games. Hence, StatNE is at least as hard as SqrtSum, and 
showing that StatNE resides inside the poljmomial hierarchy would imply a 
major breakthrough in understanding the complexity of numerical computations. 
While our reduction is similar to the one in [27], it requires new techniques to 
simulate stochastic states. 

Theorem 11. SqrtSum is poljmomial-time reducible to StatNE for turn-based 
8-p layer terminal-reward games. 

Before we state the reduction, let us first examine the game G{p), where 
< p < 1, which is played by players 0, 1, . . . , 5 and depicted in Figure 7. 

Lemma 12. The maximal payoff player 1 receives in a stationary Nash equilib- 
rium of {Q{p), si) where player receives payoff > equals y/p. 

Proof. Let ij be a stationary strategy profile of {Q{p), si) where player receives 
payoff > 0, and let qi — aoiv, \ ui) be the probability that player moves from 
Ui to Vi- We claim that q :— qi = tj2 = 1 — p if is a Nash equilibrium. Let 
z — E^j ((^4) and z' — E^^ (^5). Since a is a Nash equilibrium, we have z > 1 — p 
and z' > 1 (otherwise player 4 or player 5 would prefer to leave the game at r2 
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Figure 7. The game S{p). 



or f2)- On the other hand, since at every terminal state the sum of the rewards 
for players 4 and 5 is at most 2 — p, we have z + z' < 2 — p. Hence, z — 1 — p 
and z' = 1. Now consider the expected payoffs for players 4 and 5 from ri: 

KMi) = {l-qi)i2-p)+qi-z^2-qi-p; 
Eri(^5)=^i-z' = ^l- 

Since a is a Nash equilibrium, these numbers are boimded from below by 1 
and 1 — p, respectively (otherwise, player 4 or player 5 would leave the game at 
ri or ti). Hence, — p. The reasoning that q2 = 1 — p is analogous. 

In the following, assume without loss of generality that < p < 1 (otherwise 
the statement of the lermna is trivial). For any stationary strategy profile a ofQ{p) 
where player receives payoff > 0, let xi — cro{s2 \ ^i) and X2 — c^oisi \ V2) 
be the probabilities that player does not leave the game at v\, respectively V2. 
Given xi and X2, for i — 1,2 we can compute the payoff /,• (3:1, 3^2) '■— E^.{(p^_^_l) 
for player i + 1 from s, by 

MX„X2)='-^''^^. 

1 — q^XiX2 

To have a Nash equilibrium, it must be the case that fi{xi,X2),f2{xi,X2) > 1 
since otherwise player 2 or player 3 would prefer to leave the game at sj or S2, 
respectively, which would give the respective player payoff 1 immediately. Vice 
versa, if fi{xi,X2),f2{xi,X2) > 1 then cr is a Nash equilibrium with expected 
payoff 
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Figure 8. Reducing SqrtSum to StatNE. 
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for player 1. Hence, to determine the maximum payoff for player 1 in a stationary 
Nash equilibrium where player receives payoff > 0, we have to maximise 

j{x\,X'i) under the constraints /i(ji,J2)//2(^1/ ^2) > 1 and < x\,xi < 1. We 
claim that the maximum is reached only if xi = X2- If e-g- ^1 > ^2, then we can 
achieve a higher payoff for player 1 by setting x'2 ■= x\, and the constraints are 
still satisfied: 



p + 2<?(l-x^) _ p+2<j(l 



C^XxX'r^ 



■2<j(l-xi) 



> 1. 



Hence, it suffices to maximise f{x,x) subject to f\(x,x) > 1 and < ^ ^ 1/ 
which is equivalent to maximising f{x,x) subject to (1 — p)x^ — 2x + 1 > 
and < x < 1. and The roots of the quadratic function are (1 ± y/p) /(I — p), 
but (1 + y/p)/ {1 — p) > 1 for p > 0. Therefore, any solution x must satisfy 
X < xq := {1 — — p). Since < xq < 1 for < p < 1 and f{x,x) is 

strictly increasing on [0, 1], the optimal solution is Xq, and the maximal payoff for 
player 1 in a stationary Nash equilibriimi of {G{p),Si) where player receives 
payoff > equals indeed 



f{xo,xo) 



p + qxop 



1-q^xl 1-qxQ l-{l-p)xo 1-(1-^) 



Proof (of Theorem 11). Given an instance (di, . . . ,d„,k) of SqrtSum, where w.l.o.g. 
n > 0, d; > for each i — 1, . . . ,n, and d :— d,, we construct a turn-based 
8-player terminal-reward game {G,s) such that {G,s) has a stationary Nash 
equilibritim with payoff > (0, j^^j^^, 0, . . . , 0) if and only if Y^^i V^i ^ ^- Define 
Pi := di/d-^ for i = 1, . . . ,n. For the reduction, we use n copies of the game Q (p), 
where in the iih copy we set p to p,-; in each copy, we set the rewards to player 6 
and player 7 at all terminal states to 1 and 0, respectively. The complete game G 
is depicted in Figure 8; it can obviously be constructed in poljmomial time. 
We claim that in any (stationary) Nash equilibriiam of (^,s„) where player 



16 



receives payoff > the probability of reaching the game Q{pi) equals 1/ (n + 1) 
for all i — 1, . . . , n. First note that in any such eqmlibriiun the state sq must 
be reached with positive probability since otherwise player 7 would prefer to 
leave the game at one of the states r„ giving player payoff < 0. Now let a 
be a stationary Nash equilibriimi of {G,Sn) where player receives payoff > 0, 
and set qi := cro(s;_i | ti). By induction on /, we prove that q, = i/ {i + 1). For 
i — 1, this is true because if qi > \ then player 6 woiild prefer to leave the game 
at s\, and if qi < \ then player 7 would prefer to leave the game at ri. Now let 
/ > 1 and assume that qj —]/{] + !) for all ; < i. A simple calculation reveals 
that the expected payoffs for player 6 and player 7 from s,_i equal (/ — l)/i 
and (w + l)/f, respectively. Hence, the expected payoff for player 6 from state 
equals 

l-qi + qi--^-l-j- ^1 • 

If qi > i/ {i + 1), then this nvimber would be strictly smaller than // (/ + 1), and 
player 6 would be better off by leaving the game at s/. On the other hand, the 
expected payoff for player 7 from state equals qi{n + If qi < i/ {i + 1), 
then this number would be strictly smaller than (n + l)/(i + 1), and player 7 
would prefer to leave the game at r,. In both cases, we have a contradiction to 
a being a Nash equilibrium. Hence, qi = i/ {i + 1) for all / = 1, . . . , n, and the 
probability of reaching the game G{pi) from s„ equals 

(1 - ,i)n^,i - (1 - ^) n^y^ -i^i-n^i = ^i- 

It remains to be shown that {G,Sn) has a stationary Nash equilibrium with payoff 
- (0' I{ipr)'^' ■ ■ ■ '0) if and only if Ef-i > fc. By Lemma 12, the maximal 
payoff player 1 receives in a stationary Nash equilibrium of {G{pi),Si) where 
player receives payoff at least equals — \fdild. Hence, the maximal 
payoff player 1 receives in a stationary Nash equilibriiun of {G,Sn) where player 
receives payoff at least equals 



+ 1 d d{n + 1) 



We conclude that {G,Sn) has a stationary Nash equilibriimi with payoff > 
(0' ' 0) if and only if Ef=i Vdi>k. □ 

Again, we can combine our reduction with the game from Example 1 to prove 
a stronger result for games that are not turn-based. 
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Corollary 13. Deciding whether a concurrent 8-player terminal reward game 
has a stationary Nash equilibrium is hard for SqrtSum. 

Proof. The proof is analogous to the proof of Corollary 7, but we use the game Qi 
from Example 1 instead of the game ^2/ arid player receives reward in each 
state of Qi and reward —1 in the new terminal state. Since Qi is a terminal- 
reward game, the resulting game Q' is a terminal-reward game if the original 
game ^ is a terminal-reward game. □ 

Remark 14. The positive results of Sections 4 and 5 can easily be extended to 
equilibria in pure or randomised strategies with a memory of a fixed size A: £ N: 
a nondeterministic algorithm can guess a memory structure 2H of size k and 
then look for a positional, respectively stationary, equilibrium in the product of 
the original game Q with the memory 9Jt. Hence, for any fixed fc G N, we can 
decide in Pspace (NP) the existence of a randomised (pure) equilibrium of size k 
with payoff > x and < y. Moreover, these results extend to stochastic games 
(by appealing to results on MDPs with limit-average objectives; see e.g. [22]). 

6 Pure Strategies 

In this section, we show that PureNE is decidable and, in fact, NP-complete. Let 
Q he a concurrent game, s E S and / G II. We define 

pvalf(s) = inf£rSup^Es"''''(^i)' 

where a ranges over all pure strategy profiles of Q and t ranges over all strategies 
of player i. Intviitively, pvalp (s) is the lowest payoff that the coalition 17 \ {z} can 
inflict on player i by playing a pure strategy. 

By a reduction to a turn-based two-player zero-sum game, we can show that 
there is a positional strategy profile that attains this value. 

Proposition 15. Let Q he a concurrent game, and i e FJ. There exists a posi- 
tional strategy profile a* such that Eg '' {(pi) < pval^(s) for all states s and all 
strategies t of player i. 

Proof. We define a turn-based two-player zero-sum game Q' with players and 1 
as follows: The set of states of 0' is S' = S U (S x P^). At a state s e S, player 1 
chooses an action profile a that is legal at s, which leads the game to the state 
(s, fl). At a state of the form {s,a), player chooses an action h G Pi{s), which 
leads the game to the state 5{s, {a_j,b)). Finally, player O's reward at a state 
s G S or (s, fl) G S X is r'(s) = r'(s, a) = r,(s) (and player I's reward is 
the opposite). By [10], there exists a function v: S' ^> Q (the value function) 
and positional strategies a* and t* for player 1 and player 0, respectively, such 
that Ej''^ (^q) < v{s) for all s G S' and all strategies t of player in Q' , and 
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''^(^o) — ^(^) s e S' and all strategies a of player 1 in Q' . We can 

translate player I's strategy a* into a positional strategy profile a* of Q such that 
Es ' (^0 < v{s) for all states s G S and all strategies t of player i in ^. Hence, 
pvalp(s) < sup^ e!^"''^(^,) < v{s) for all s G S. We claim that pvalf (s) > i/(s) 
for all s G S, which implies that pvalp(s) = v(s) for all s G S and that a* is the 
strategy profile we are looking for Otherwise, there would exist a pure strategy 
profile a in. Q such that sup^E^ "^(^;) < v{s) for some s G S. But we could 
translate such a strategy profile cr into a pure strategy cr of player 1 in Q' such 
that Ej*''^(^g) < v(s), a contradiction to the optimality of t*. □ 

Given a payoff vector z G (R U {±oo})^, we define a directed graph G(z) — 
{V, E) (with self-loops) as follows: V = S, and there is an edge from s to Hf 
and only if there is an action profile a with S{s,a) = t such that (1) a is legal 
at s and (2) pval?(^(s, {a_i,b))) < z; for each player / and each action b G r|(s). 
Following [3], we call any a that fulfils (1) and (2) z-secure at s. 

Lemma 16. Let z G (R U {±oo})^. If there exists an infinite path n in G(z) 
from So with z/ < (pi{n) for each player i, then {Q, sq) has a pure Nash equilibriimi 
with payoff <pi{7T) for player i. 

Proof. Let n = sqSi . . . be an infinite path in G (z) from sq with z; < ( n) for each 
player i. We define a piire strategy profile a as follows: For histories of the form 
X — sqUoSi . . . Sjt-ifljfc-iSfc, we set ^{x) to an action profile a with 5{s}^,a) — Sfc_|_i 
that is z-secure at s^-. For all other histories x — toflo^i ■ ■ ■ h-i^k-\h> consider the 
least / such that Sy+i 7^ ^j+l- If differs from a z-secure action profile a at Sj 
in precisely one entry /, we set a{x) = a* [tf^), where a* is a (fixed) positional 
stiategy profile such that Eg {(pi) < pval^ (s) for all s G S (which is guaranteed 
to exist by Proposition 15); otherwise, a{x) can be chosen arbitiarily. It is easy to 
see that a is a Nash equilibrium with induced play n. □ 

Lemma 17. Let cr be a pvire Nash equilibriiim of {Q, sq) with payoff z. Then there 
exists an infinite path tt in G(z) from sq with <l>i{n) — z; for each player i. 

Proof. Let soaosifli ... be the play induced by a. We claim that n :— sqSi ... is 
a path in G(z). Otherwise, consider the least k such that (sj^,sjt+i) is not an 
edge in G(z). Hence, there exists no z-secure action profile at s := sj^. Since 
flfc is certainly legal at s, there exists a player / and an action b G ^(s) such that 
pvalf (<5(s, («_;, b))) > Zj. But then player / can improve her payoff by switching 
to a strategy that mimics tr, until s is reached, then plays action b, and after that 
mimics a strategy that ensures payoff > z, against any pure strategy profile. This 
contradicts the assumption that </ is a Nash equilibrium. □ 

Using Lemmas 16 and 17, we can reduce the task of finding a pure Nash 
equilibriiam to the task of finding a path in a multi-weighted graph whose 



19 



limit-average weight vector falls between two thresholds. The latter problem can 
be solved in polynomial time by solving a linear programme with one variable 
for each pair of a weight function and an edge in the graph, as we prove in the 
appendix. 

Theorem 18. Given a finite directed graph G = (V, E) with weight ftmctions 

TQ' ■ ■ ■ rTk-i- y ^ Qr '^0 & Vr and x,y E (Q U {±00})*^, we can decide in 
pol5momial time whether there exists an infinite path n — vqVi ... in G with 
Xi < lim inf„^oo I EJ^q ''i (^;) < i// for all i = 0, . . . , A: - 1. 

We can now describe a nondeterministic algorithm to decide the existence 
of a pure Nash equilibrium with payoff > x and < y in polynomial time. 
The algorithm starts by guessing, for each player i, a positional strategy profile cr' 

of Q and computes Pi{s) :— sup^Eg {(pi) for each s G S; these nimibers can 
be computed in polynomial time using the algorithm given by Karp [18]. The 
algorithm then guesses a vector z € (R U {±00})^ by setting z, either to x, or 
to pi{s) for some s € S with x, < p;(s), and constructs the graph G'(z), which is 
defined as G(z) but with pj(s) substituted for pvalf (s). Finally, the algorithm 
determines (in polynomial time) whether there exists an infinite path tt in G(z) 
from So with z, < (pi{n) < y,- for all i G J7. If such a path exists, the algorithm 
accepts; otherwise it rejects. 

Theorem 19. PuxeNE is in NP. 

Proof. We claim that the algorithm described above is correct, i.e. soimd and 
complete. To prove soimdness, assume that the algorithm accepts its input. 

Hence, there exists an infinite path n in G'(z) from sq with z, < (/'/(tt) < y,. 
Since pvalp(s) < p, (s) for all i E 17 and s E S, the graph G'(z) is a subgraph 
of G(z). Hence, n is also an infinite path in G(z). By Lemma 16, we can conclude 
that {G, So) has a puxe Nash equilibrium with payoff > z> x and < y. 

To prove that the algorithm is complete, let be a pure Nash equilibrium of 
(^,So) with payoff z, where x < z < y. By Proposition 15, the algorithm 
can guess positional strategy profiles a' such that p/(s) = pvalp(s) for all 
s E S. If the algorithm additionally guesses the payoff vector z' defined by 
z- — max{xi,pvalf (s) : s E S,pvalp(s) < z;} for all i E 17, then the graph G(z) 
coincides with the graph G(z') (and thus with G'(z')). By Lemma 17, there exists 
an infinite path tt in G(z) from sq such that z- < z; = ^/(tt) < y; for all i E IJ. 
Hence, the algorithm accepts. □ 

The following theorem shows that PureNE is NP-hard. In fact, NP-hardness 
holds even for turn-based games with rewards and 1. 

Theorem 20. PuxeNE is NP-hard, even for turn-based games with rewards 
and 1. 
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Proof. A gain, we reduce from SAT. Given a Boolean formula (p — A • • • A Cm 
in conjunctive normal form over propositional variables Xi, . . . , X„, where w.l.o.g. 
m > 1 and each clause is nonempty, let Q be the turn-based game described in 
the proof of Theorem 9 and depicted in Figure 6. We claim that the following 
statements are equivalent: 

1. cp is satisfiable. 

2. {G,Ci) has a positional Nash equilibrium with payoff > 1 for player 0. 

3. {Q,Ci) has a pure Nash equilibrium with payoff > 1 for player 0. 

Since the implication (1. 2.) was already proved in the proof of Theorem 9 
and the implication (2. ^ 3.) is trivial, we only need to prove that 3. implies 1. 
Hence, assume that {Q, Ci) has a pure Nash equilibrium a with payoff > 1 for 
player 0. Since player receives payoff > 1, the terminal state ± is not reached 
in the induced play n. Consider the variable assignment a that maps X; to true 
if and only if player i receives payoff < 1 from n; we claim that a. satisfies the 
formula. Consider any clause C. Set T = { (C, X,), (C, -■X;) : i — 1,..., n}, and 
denote by Is the characteristic function of s G T. We have 



In particular, there exists a state s = (C, L) such that 

liminf ^ y; -ls(7r(/)) <0. 
n^oo n 

U L — Xi, then < 1 — Ig. Hence, (f>i{n) < 1, and a maps X, to true, thereby 
satisfying C. If L = -iX;, then player i must receive payoff 1, because otherwise 
she could improve her payoff by playing from s to _L. Hence, a maps X; to false 
and satisfies C. □ 

It follows from Theorems 19 and 20 that PureNE is NP-complete. By combin- 
ing our reduction with a game that has no pure Nash equilibrium, we can prove 
the following stronger result for non-turn-based games. 

Corollary 21. Deciding the existence of a pure Nash equilibrium in a concurrent 
limit-average game is NP-complete, even for games with rewards and 1. 

Proof. The proof is analogous to the proof of Corollary 10. □ 

Note that Theorem 20 and Corollary 21 do not apply to terminal-reward 
games. In fact, PureNE is decidable in P for these games, which follows from two 
facts about terminal-reward games: (1) the numbers pvalf (s) can be computed 
in polynomial time (using a reduction to a ttirn-based two-player zero-sum game 
and applying a result of Washburn [31]), and (2) the only possible vectors that 




1 

2m 



< 0. 
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can emerge as the payoff of a pure strategy profile are the zero vector and the 
reward vectors at terminal states. 

Theorem 22. PureNE is in P for terminal-reward games. 
7 Randomised Strategies 

In this section, we show that the problem NE is undecidable and, in fact, not 
recursively enumerable for turn-based terminal-reward games. The proof pro- 
ceeds by a reduction from an undecidable problem about two-counter machines. 
Such a machine is of the form M. — (Q, qo, A), where 

- Q is a finite set of states, 

- ^0 € Q is the initial state, 

- -4 C Q X r X Q is a set of transitions. 

The set F specifies which instructions A4 may perform on its counters. For our 
purposes, the instruction set F :— {inc(;),dec(/),zero(;) : = 1,2} suffices: a 
cotmter can be incremented, decremented, or tested for zero. For q E Q we 
write qA for the set of all {j,q') G F x Q such that {q, 'y,q') S A. The machine A4 
is deterministic if for each q & Q either (1) qA = 0, (2) qA = {{mc{i),q')} for 
some i e {1,2} and q' € Q, or (3) qA = {(zero(/),(ji), (dec(/),(j2)} for some 
e {1,2} and qi,q2 e Q. 

A configtiration of is a triple C — {q,i\,i2) G Q x N x N, where q denotes 
the current state and ij denotes the current value of counter /. A configura- 
tion C' = {q',i[,i'2) is a successor of configuration C = {q, 11,12), denoted by 
C h C, if there exists a "matching" transition {q,J,q') G A. For example, 
iqJirh) l~ iq''h + 1/22) if and only if {q,mc{l),q') G A. The instruction 
zero(;') performs a zero test: {q,i\,i2) l~ {q' ,h,h) if and only if ii — and 
{q,zero{l),q') E A, or 12 — and (^,zero(2),^') G A. 

A partial computation of Ai is a sequence p = p{Q)p{l) ... of configurations 
such that p{Q) h p{l) h • • • and jo(0) — {cio,0,0) (the initial configuration). 
A partial computation of is a computation of M. if it is infinite or it ends 
in a configuration C for which there is no C with C h C. Note that each 
deterministic two-counter machine has a unique computation. 

The halting problem is to decide, given a machine Ai, whether the computation 
of Ai is finite. It is well-known that deterministic two-counter machines are 
Turing powerful, which makes the halting problem and its dual, the non-halting 
problem, undecidable, even when restricted to deterministic two-counter ma- 
chines. In fact, the non-halting problem for deterministic two-counter machines 
is not recursively enumerable. 

To prove the undecidability of NE, we employ a reduction from the non- 
halting problem for deterministic two-counter machines. More precisely, we 
show how to compute from such a machine M. a game {Q,so) such that the 
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computation of M. is infinite if and only if there exists a Nash equilibrium of 
{G,so) where player receives expected payoff > 0. Without loss of generality, 
we assume that in M there is no zero test that is followed by another zero test: 
if {q,zeTo{j),q') e A, then \q'A\ < 1. 

The game Q is played by players 0, 1 and 12 other players Aj, Bj, D' and Ej, 
indexed by /' e {1/2} and t e {0, 1}. Intuitively, player and player 1 build 
up the computation of M: player updates the counters, and player 1 chooses 
transitions. Players Aj and Bj make sure that player updates the counters 
correctly: players and Aj ensure that, in each step, the value of counter j 
is not too high, and players and B^ ensure that, in each step, the value of 
counter is not too low. More precisely, A^ and Bj* monitor the even steps of the 
computation, while Aj and Bj monitor the odd steps. Finally, players D* and Ej 
ensiire that player uses a randomised strategy of a restricted form. 

Let r' := r U {init}. For each q e Q, each 7 G F', each ; G {1,2} and each 
t e {0, 1}, the game Q contains the gadgets Sty^^, and C^^, which are depicted 
in Figure 9. The initial state of Q is sq := s-'^^^^^. Note that in the gadget S^^, 
each of the players Aj, Bj, and Ej may unilaterally decide to quit the game, 
which gives the respective player a payoff of 1 or 2, but payoff —1 to player 0. 

It will turn out that player 1 will play a pure strategy in any Nash equilibrium 
of {Q, So) where player receives expected payoff 0, except possibly for histories 
that are not consistent with the equilibrium. Moreover, player has to play 
a uniform distribution inside S^^. Formally, we say that a strategy profile a 
of Q is safe if 1. (Tq{xs) assigns probability j to both outgoing transitions for aU 
histories xs consistent with a and ending in a state s G S^^ controlled by player 0, 
and 2. ai{xs) is degenerate for aU histories xs consistent with a and ending in a 
state s controlled by player 1. 

For each safe strategy profile a where player receives expected payoff 0, 
let xqSq -< x\S\ -< X2S2 -< .. . {Xi E S*, Sj E S, xq — e) be the imique sequence 
consisting of all histories xs of {Q,So) consistent with tr that end in a state s 
of the form s = s^^,^. This sequence is infinite because cr is safe and player 
receives expected payoff 0. Additionally, let qo,qi,-.. be the corresponding 
sequence of states and 70, 71, . . . be the corresponding sequence of instructions, 
i.e. s„ — s^^ (i^ or s„ — sl^^^^ for all n G N. For each G {1/2} and n G N, 
we define two conditional expectations as follows: 

ttj := Egjj ((^^K mod 2 I XnSn '5 ); 

Note that at every terminal state of the counter gadgets C^y and C^~' the rewards 
of player Aj and player Bj sum up to 4. For each the conditional probability 
that, given the history XnSn, we reach such a state is Ylken ^ ' \ — \- Hence, 
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Figure 9. Simulating a two-counter machine. 
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aj + bj —2 for all n G N. We say that cr is stable if a" = 1 or, equivalently, bj — 1 
for each ; G {1,2} and for all n G N. 

Finally, for each G {1/2} and n G N, we define a nimiber c" G [0,1] as 
follows: After the history x„s„, with probability | the play proceeds to the state 
controlled by player in the counter gadget C" "^"^ ^. The number c" is defined 
as the probability that player plays to the neighbouring grey state. Note that, 
by the construction of Q, it holds that c" = 1 if 7„ = zero(;') or 7„ — init. In 
particular, — — 1. 

Lemma 23. Let cr be a safe strategy profile with expected payoff for player 0. 
Then cr is stable if and only if 

^^-cj if 7„+i = inc(y), 

2 ■ cj if 7„+i = dec (7), 

cj = 1 if 7„+i = zero(/), 

^ c" otherwise. 

for each ; G {1,2} and for all n G N. 

To prove the lemma, consider a safe strategy profile a of Q with expected 
payoff for player 0. For each G {1,2} and n G N, we define yet another 
conditional expectation 



0]+^ = < 



The following claim relates the numbers aj and pj. 

Claim. Let ; G {1,2}. Then aJ = 1 for all n G N if and only if pj = | for all 
n G N. 

Proof. (^) Assume that aJ = 1 for all n G N. We have aJ ^ pj + l- a^^"^ and 
therefore 1 = p" + | for all n 6 N. Hence, p" = | for all n G N. 

(■^) Assume that p^" = | for all n 6 N. Since fl^" = p" + | • fl"+^ for all n G N, 
the numbers fl" have to satisfy the following recurrence: a"^^ = 4fl" — 3. Since 
all the numbers fl" are bounded by the minimum and maximum reward for 
player A" ^, we have < a" < 4 for all n G N. It is easy to see that the only 
values for and fl| such that < «" < 4 for all n G N are aJ = aj — 1. But this 
implies that a" — 1 for all n G N. □ 

Proof (of Lemma 23). By the previous claim, it suffices to show that p" = | if and 
only if (1) holds. Let G {1,2}, n G N and t — n mod 2. The number p" can be 
expressed as a weighted average of the expected payoff for player A* inside C^^ j 
and the expected payoff for player Aj inside C^~^^ y. The first payoff does not 
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depend on j„, but the second depends on 7„+i. Let us consider the case that 
7„+i — inc(y). In this case, pj equals 

l.{cJ-2 + {l-cJ)-3) + lcJ+^-A^l-lcJ + l-cJ+\ 

Obviously, this sum equals | if and only if c"+^ = j • c". For any other value 
of 7„+i, the argumentation is similar. □ 

The next lemma states that every Nash equilibrium with expected payoff 
for player is, in fact, safe. 

Lemma 24. Let cr be a Nash equilibrium of {Q,so) with expected payoff for 
player 0. Then a is safe. 

Proof. We start by proving that player plays a uniform distribution inside S*, 
We prove this separately for histories that end in a white state and histories that 
end in a grey state. 

Let xs be a history consistent with a and ending in a white state s e S!^^ 
controlled by player 0. Since the players Ei and E2 can ensure payoff 1 by 
quitting the game, player has to play to ^ and 2 with probability j each. 
Otherwise, a woiild not be a Nash equilibriirm. 

Now let xs be a history consistent with a and ending in a grey state s G ,j 
controlled by player 0. In the following, let t = 0; the proof for t = 1 is analogous. 
Denote by p the probability that player plays to t e after the history xs. For 
i e {0,1}, let 

d'^E%{cp^i\xst-Sn. 

By the definition of the game, we have > 1 and d^ > 2. On the other hand, 
since at every terminal state the sum of the rewards for players and is at 
most 3, we have + < 3. Hence, d^ — 1 and d^ — 2. Consider the expected 
payoffs for players and after the history xs: 

K^'Pdo Ixs-S"^) = {l-p)-3 + p-d° ^3-2p; 
E%{<p^^\xs-S^)^p-d^ ^2p. 

Since ^ is a Nash equilibrium, these nimibers are bounded from below by 2 
and 1, respectively (otherwise, it would be better for player D" or to quit the 
game). Hence, p — \- 

To prove that a is safe, it remains to be shown that player 1 plays a de- 
generate distribution for all histories xs consistent with a and ending in a 
state s & P^. Towards a contradiction, assume that xs is such a history and 
that (ri{xs) assigns probability > to two distinct successor states. Hence, 
qA = {(zero(;),^i), (dec(;),^2)} for some ; G {1,2} and qi,q2 & Q- By our 
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assiunption that there are no consecutive zero tests and since player receives 
expected payoff 0, 

E'l ((pi \ XS ■ S^~^ ■S'^)>1, 

but 

Hence, player 1 could improve her payoff by playing to s^~[^^y^ with probabil- 
ity 1, a contradiction to cr being a Nash equilibrium. □ 

Finally, we can prove the following theorem. 

Theorem 25. NE is not recursively enumerable, even for turn-based 14-player 
terminal-reward games. 

Proof. We claim that the function mapping a deterministic two-counter ma- 
chine M. to the 14-player game {G,so) as described above realises a many-one 
reduction from the non-halting problem to NE. Clearly, Q can be computed 
from A4. We prove that the computation of A4 is infinite if and only if (t/, Sg) 
has a Nash equilibrium in which player receives expected payoff (at least) 0. 

Assume that the computation p = p{0)p{l) . . . of is infinite. Player O's 
equilibriimi strategy (Tq can be described as follows: For a history that ends at 
the unique state controlled by player in the gadget ^ after visiting a state of 
the form Sy ^ or sj|,7^ exactly n > times, player plays to the grey successor 
state with probability where i is the value of counter / in configuration 
p{n — 1). Moreover, for a history that ends at a state controlled by player in 
the gadget Sf^,^, player plays to both successors with probability ^ each. 

The only place where player 1 has a choice is the sole state in the gadget l| for 
qA — {(zero(;), qi), (dec(;), q2)}- If the play arrives at such a state after visiting a 
state of the form s*^ ^, or s^~' exactly n > times, then player I's pure strategy ai 
prescribes to play to S^~^j.^.^ if the value of counter in configuration p{n — \) 
is zero and to 5^"^'^^^ if the value of coimter / in configuration p{n — \) is 
non-zero. 

Any other player's pure strategy is defined as follows: After a history ending 
in Sly c^, the strategy prescribes to qmt the game if and only if the history is 
not compatible with p (i.e. the corresponding sequence of instructions does not 
match jO). 

Note that the resulting strategy profile a is safe. Moreover, since player and 
player 1 follow the computation of M., a terminal state inside one of the covinter 
gadgets C^y is reached with probability 1. Since player receives reward at any 
such terminal state, player O's expected payoff equals 0. Finally, by the definition 
of a, for each / e {1, 2} and for all n e N, if / and /' are the values of counter / 



27 



in configuration p{n) and configuration p{n + 1), respectively, then cj — 2 ', 
cj+i — 2~'', and jn+i is the instruction corresponding to the counter update 
from p{n) to p{n + 1). Hence, (1) holds, and we can conclude from Lemma 23 
that a is stable. 

We claim that a is, in fact, a Nash equilibrium of {Q,sq): It is obvious that 
player cannot improve her payoff. If player 1 deviates, then with positive 
probability we reach a history that is not compatible with p; hence, player 
or A\ will quit the game, which ensures that player 1 will receive payoff after 
this history. Since if is stable, none of the players or can improve her 
payoff. Finally, the expected payoffs of player and player D^^' from sf^,^ equal 
2 and 1, respectively, which is the same as they would get if they quit the game. 
The reasoning for players Ei and E2 is analogous. 

(<;=) Assirme that a is a Nash equilibrium of {G,so) with expected payoff > 
for player 0. Since is the maximum reward for player 0, this means that the 
expected payoff of a for player equals 0. From Lemma 24, we can conclude 
that a is safe. To apply Lemma 23 and obtain (1), it remains to be shown that 
a is stable. In order to derive a contradiction, assume that there exists ; E {1,2} 
and n e N such that either «" < 1 or a" > 1, i.e. bj < 1. In the first case, 
player A" ^ could improve her payoff by quitting the game after history XnSn, 
while in the second case, player BJ ^ could improve her payoff by quitting 
the game, again a contradiction to a being a Nash equilibrium. 

From (1) and the fact that cj* = 1, it follows that each c" is of the form c" = 2^' 
with i e N. We denote by i" the unique nim\ber i such that cj — 2~' and set 
p{n) — ((7n,i",i2) for each n G N. We claim that p :— p{0)p{l) ... is in fact the 
computation of M. In particiilar, this computation is infinite. It suffices to verify 
the following two properties: 

-p{0) = (<?o,0,0). 

- p{n) h p{n + 1) for all n e N. 

The first property is immediate. To prove the second property, let p{n) — 
{(jJi^ii) and p{n + 1) = {q',i[,i'2). Hence, s„ lies inside Siy^^, and s„+i lies 
inside S^7*/ for suitable 7, 7' and t — n mod 2. We only prove the claim for 
qA — {(zero(l),(ji), (dec(l),(^2)}; the other cases are similar. Note that, by 
the construction of the gadget i^, it must be the case that either q' — qi and 
7' = zero(l), or q' = q2 and 7' = dec(l). By (1), if 7' = zero(l), then i[ = ii = 
and = 12, and if 7' — dec(l), then i[ — — 1 and i'2 — h- This implies 
p{n) h p{n + 1): On the one hand, if ii — 0, then i\ ^ h — 1, which implies 
7' ^ dec(l) and thus 7' = zero(l), q' = qi and i\ = i\ = 0. On the other hand, 
if ii > 0, then 7' 7^ zero(l) and thus 7' — dec(l), q' — q2 and i'^ — ii — 1. □ 

For games that are not turn-based, we can show the stronger theorem that 
the set of aU games that have a Nash equilibrium is not recursively enumerable. 
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Corollary 26. The set of all initialised concurrent 14-player terminal-reward 
games that have a Nash eqiiilibriirai is not recursively enumerable. 

Proof. The proof is analogous to the proof of Corollary 10, but we use the game Qi 
from Example 1 instead of the game Q2, arid we set the reward for player in 
each state of Qi to 0. □ 

8 Conclusion 

We have analysed the complexity of Nash equilibria in concurrent games with 
limit-average objectives. In particular, we have shown that randomisation in 

strategies leads to undecidability, while restricting to pure strategies retains 
decidability. This is in contrast to stochastic games, where pure strategies lead to 
undecidability [27]. While we have provided matching and lower boimds in most 
cases, there remain some problems where we do not know the exact complexity. 
Apart from StatNE, these include the problem PureNE when restricted to a 
boimded nimiber of players. 
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Appendix 

This appendix is devoted to the proof of Theorem 18, which is restated here. 

Theorem 18. Given a finite directed graph G — {V,E) with weight fimctions 

tq, . . . ,ri^_-[: V ^ Q, vq E V, and x,y e (Q U {±oo})*^, we can decide in 
polynomial time whether there exists an infinite path n — vqVi ... in G with 
Xi < lim inf„^oo ^ EjZo ^ii^j) - Vi all f = 0, . . . , fc - 1. 

In the following, let G = (V, E) be a finite directed graph with weight 
functions Tq, . . . , rj^_i : V ^ Q, and set [k\ = {0, 1, . . . , A: — 1}. Given a vertex 
17 G y, we write In(i^) and Out(z;) for the set of aU edges that end, respectively 
start, in v. Moreover, given an edge e — {u,v) E E we set ri{e) :— ri{u). We 
extend the weight fimctions r; to finite paths by setting ri{vi . . .v„) — EyLi ^iiPj)- 
If TT = 7t(0)7t(1) . . . is an Infinite path and n e N, we write n \ n for the finite 
path 7r(0) . . . n{n — 1), and we set (pi{n) := lLminf„_).oo ri{n \ n)/n, i.e. ^i{n) is 
precisely the limit-average weight of the path n w.r.t. the weight function r;. 
Finally, (p{n) denotes the vector Now consider the following linear 

constraints over the variables where i G [fc] and e G E: 

(1) fi,e > for all i G [k] and e G E; 

(2) I:'.6eA. = lforallf G [k]; 

(3) EeeH^) /'> = EeeChxtiv) fi,e for all / G [k] and veV; 

(4) Xi < EeeEfi,e ■ ri{e) < yi for aU i G [k]; 

(5) Le€Efi,e ■ ri{e) < Le€Efi,e ■ ri{e) for aU i,j G [k]. 

Lemma 27. If there exists an infinite path zr in G such that x < <p{Tt) < y, then 
there exists a solution to (l)-(5). 
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Proof. Let n — 7r(0)7T(l) ... be an infinite path in G such that x < <p{n) < y. 
Given n G N and eeE, define K{n,e) :— \{j < n : {n{j),n{j + 1)) — e}\. 
Moreover, for n > 0, set A(m, e) = K{n,e)/n. Note that < A(n, e) < 1 for all 
e e £ and n e N. In order to define the numbers //^g, let us now fix / G [k] . Since 
^/(tt) = lim inf„_j,oo ri{n \n)/n, there exist natural numbers Kk^ <k\ < ■ ■ ■ 
such that (pi{n) = lim„_).oor;(7r \k\)/¥„. Now we define a sequence (pQ,f\,... 
of vectors ^j, G by setting <pj,(e) — A(fc^,e). Since this sequence is bounded, 
by the Bolzano-Weierstrass theorem, there exists a converging subsequence 
ipQ, ip\,... of this sequence. We set /,• = lim„_j>oo (^) for eeE. 

We claim that the numbers (/j>)ig[fc],egE form a solution of (l)-(5). That 
(1) holds is obvious from the definition. (2) follows from the fact that 
Eee£A(n,e) = 1 for all n G N. To show that (3) holds, fix i; e y. Note 
that we have 'Leein{v)^{^>^) ~ 11eeOut{v)^i''^'^) ^ {-1/0,1} and therefore 
-1/n < T,eein{v) H^''^) -ILeeOut{v)H^'<^) ^ 1/" ^r all n e N. Hence, the 
terms LeeHv) - Le€Ovit(v) fU^) converge to when n goes to infinity. 

Since xpQ, :p[,...isa subsequence of cpQ, (p\, . . ., the same is true for the terms 
LeeHv) fn{e) - EeeOut(t>) fn{e). Since lim„^oo i/'Ue) = exists for all eeE, 
this implies that E£,gin(D) /;> - EeGOut(D) fi.e = 0, which proves (3). In order to 
prove (4) and (5), note that for aU i,j G [A:] we have 

(piin) — liminf r;(7r \n)/n 

< lim inf r,- ( tt \ A:{, ) /k^„ 

— liminf Mkn,e) ■ rde) 
n—^co '-^ 
eeE 

= lirriinf ^„(e) •r;(e) 



n— >oo 

eeE 



< 



lim ^!p^e)-r,(e) 



n->oo 

eeE 



= YLfjfi-nie)- 

eeE 

Moreover, if i = j, both inequalities are equalities since lim„^oof!(7r \¥„)/kl^ 
exists and equals <pi{7z). Hence, Y^eeEfi,e ' ~ 'Pii.'^) < EeeE//> ' for 
all i,i E [k], which proves (5). Finally, (4) follows from the assimiption that 
X < ^{n) <y. □ 

Lemma 28. For all n e N, 

n-l 

(n - 2) • j\ < n\ 

Proof. By induction over n. □ 
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Lemma 29. Assume that G is strongly connected and that there exists a solution 
to (l)-(5). Then there exists an infinite path tt in G such that x < (p{n) < y. 

Proof. Let G be strongly connected and assume that there exists a solution to 
(l)-(5). It is well-known that if a given system of linear constraints has a solution, 
then there exists one in rational numbers. Let {fi,e)ie[k],eeE be such a solution, 
where w.l.o.g. /j^g = Cj^./^ with Cj^, e N and d e N \ {0}. Finally, let z e IRW be 
defined by z; — I^^gf /,> • r,(e); by (4), x < z <y. We claim that there exists an 
infinite path tt in G with (p{n) = z. 

For each i e [k] consider the directed multigraph G,, which is derived from G 
by replacing a single edge {u,v) e E by as many as c, g edges from u to v. 
By (3), we have Efein(o) '^i,e = Et.eOut(o) '^i,e for all v e V. Hence, in G,- each 
vertex has as many incoming edges as outgoing edges, which is a necessary and 
sufficient condition for the existence of an Etderian cycle in each of the cormected 
components of G,. These cycles give rise to (disjoint, not necessarily simple) 
cycles 7^, . . . , 7m ir* G, where m < \V\. 

Consider for each n £ N the cycle ^J, that starts by repeating the cycle j\ 
n times, then takes the shortest path to the first vertex in the cycle 72, repeats 
this cycle n times, and so on, until, after repeating the cycle Ym " times, taking 
the shortest path back to 7^. Let M — max^gy ry(i') be the maximimi weight 
w.r.t. Vj. Note that: 

" • E • rj{e) < rjiCn) < n • E c<> ' rj{e) + \V\^- M, 

eeE eeE 
eeE eeE 

Hence, 

rj{^'„) T.eeECi,e-rj{e) ZeeE fi,e ■ rj{e) 
\bn\ L,eeE''i,e L,eeE Jhe eeE 

where the last inequality follows from (5). Moreover, if i — j, we have equality, 
i.e. limn^oori{^l,)/\^n\ = z,-. 

The desired infinite path tt is the concatenation of finite paths n„, where 

n — 1, 2 The path iZn repeats the cycle f ^ nl times and then takes the 

shortest path to the first state on the cycle ^i"^^^ ^. We will now prove that 
(j>o{^) — zo; for all other weight functions, the proof is analogous. For all n G N, 
we have: 

E /! ro nfc| V|M < ro(7ri .. . n„,) < j\ r, [l] "^"^ ') + nk\V\M, 

;=1 ;=1 

\<\ni---n„k\<l_,]l\Cj \+nk\V\. 
;=1 ;=1 
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By Leirana 28, we have limn->-co jW {nk)\ — 0. Hence, and since 
|^j|,ro(fj) < j ■ c for some constant c, we have: 

n^oonk[nk)l ^ ' ' n^<x> nk ' 

1 ■ dfc 1 



Hence, 



lim ''°(^^---^"'-^ ^lim^^zo. 
„_^oo \ni---nnk\ n^oo l^^l 

We have thus found a subsequence of ro(7r \n)/n that converges to Zq, which 
implies that <^o(^) = liminf„_).oo ?'o(7T t w)/n < Zq. On the other hand, using the 
fact that lim„_^oo ''o (Cm ) / 1 I ^ for all i G [fc], we can show that (tt) > zq. □ 

Proof (of Theorem 18). Since the limit-average criterion is prefix-independent, 
it suffices to decompose G into its strongly cormected components (which can be 
done in linear time) and check for each component C that is reachable from vq 
whether exists an infinite path in C with x < (p{n) < y. By Lemmas 27 and 29, 
such a path exists if and only if there exists a solution to the linear constraints 
(l)-(5) derived from C. The existence of such a solution can be checked in 
polynomial time (see [23]). □ 
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